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Abstract 

Despite the growing use of media in the classroom, the effects of using of audio 
versus video in pronunciation teaching has been largely ignored. To analyze the 
impact of the use of audio or video training on aural discrimination of vowels, 61 
participants (all students at a large American university) took a pre-test followed 
by two training sessions on a vowel contrast (/i/-/i/). One group received audio 
training and the other group received video training. The groups then took a 
post-test and delayed post-test to determine the impact of the training. Students’ 
reactions to the two training types were also obtained through a questionnaire. 
Results showed that while both groups improved significantly from the pre-test 
to both post-tests, there was no statistically significant difference between the 
video and audio groups. Results show that reactions were more favorable to the 
video training. 

Introduction 

Due to increased use of media in language classrooms, it is important to consider 
what effects these technologies might have in terms of student performance, 
motivation, and attitude. While technology has been recognized as an important 
teaching tool for pronunciation, most of the emphasis has been on developing 
automatic speech recognition, visual feedback, and software use (for examples 
see Banafa, 2008; Engwall & Balter, 2007; Hincks & Edlund, 2009; Levis & 
Pickering, 2004; Neri, Cucchiarini, & Strik, 2008; Wang & Munro, 2004). 

However, the effects of incorporating audio versus video in the classroom have 
largely been ignored. This is problematic because audio and video materials are 
commonly used, whether through teacher-recorded materials, websites, 
podcasts, or even software programs, and incorporated into classes without 
knowing whether the two have different impacts. This research study aims to 
investigate the effects of using audio and video pronunciation training on 
perception of English vowels. 
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Review of the Literature 

Numerous researchers have discussed the use of listening activities in improving 
pronunciation accuracy. For instance, MacCarthy (1976) described the process of 
second language learners "learning to listen" in the second language (L2): 

It should be made clear at this point that "learning to listen” has 
nothing to do with improving the physical properties of anyone’s 
actual hearing mechanism: phrases such as " to have a good ear for 
languages”, to "train" or "improve the ear", refer to the ear in a 
metaphorical sense only. Aside from those individuals who are 
unfortunate enough to have truly defective hearing (most of whom 
will not even try to learn foreign languages, let alone teach them), it 
can be assumed that the actual hearing of the language learner is 
adequate to his task. What is required, though, is that he should 
learn to direct his attention to the auditory phenomena of the new 
language, and that he should do so in a way quite different from the 
way he listens to and understands his own language: through long 
training in the mother tongue (quite properly) get into the habit of 
noticing by and large only the parts that are relevant to catching 
the meaning, and of paying no attention to-literally not noticing- 
anything else. (p. 213) 

For vowels in particular, learning to listen to English will often require that the 
student develop vowel categorizations appropriate for English. English vowels 
are distinguished by the following characteristics: tongue position and height 
within the oral cavity, lip rounding or spreading, tension, and gliding (vs. simple 
vowels). Length is not a distinguishing feature, but instead is influenced by the 
vowel’s environment (Celce-Murcia, Brinton, & Goodwin, 2010). 

Research has supported the notion that sounds in an L2 are filtered through the 
phonological system of the first language (LI) (Beddor & Strange, 1982; 
Blankenship, 1991; Flege, Munro, & Fox, 1993). Filtering sounds through the LI 
can lead an L2 learner to make distinctions that are inappropriate for the L2. 
English vowel pairings such as /i/ and /i/, /e/ and /z/, and /z/ and /ae/ are 
likely to be problematic for learners, because vowels with similar articulatory 
positions are difficult to discriminate. Learning to listen to the L2 then will entail 
a redefining of the vowel space to better reflect the vowel distinctions of English. 
However, filtering through the LI does not entirely explain the use of vowel 
errors for L2 learners. Bohn and Flege (1990) show that non-native speakers 
often rely on vowel length, even if they do not do so in their LI. Morrison (2008) 
explains this phenomenon by proposing that this use of length as a cue is a 
developmental stage, which may be based, at least in part, on the teaching of 
vowels as "long” and "short." This means that learning appropriate vowel 
categories will also entail a shift from focusing on the length of the vowel to 
quality of the vowel (spectral cues). 
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Learning to create aural discrimination categories based on spectral cues is not 
only important for comprehension, but it is also likely to play a role in 
production. Researchers have found that practice in perception can affect and 
improve production (Bradlow, 1997; Rochet, 1995; Rvachew, 1994). Thus, 
instruction must help students listen for and produce the articulatory differences 
of vowels and lead them away from a reliance on vowel length. One way of 
achieving this goal is through the use of listening exercises that utilize minimal 
pairs. Celce-Murcia, Brinton, and Goodwin state, "Since vowels are often more 
easily distinguished in relation to one another than by any external standard of 
measurement, the perception of contrasts is an essential starting point in the 
teaching of vowels” (2010, p. 138). These vowel discrimination exercises will 
help students learn to listen for the subtle differences among the vowels and will 
help students shift from utilizing length as the main cue. 

Two of the suggested activities using minimal pairs are "Same/Different” in 
which students hear two words and have to decide whether the two words are 
the same or different and "Which of the words is different?", in which students 
would hear three or four words and decide which word in the string is different 
(Celce-Murcia et al., 2010, p. 140) All of these activities utilize minimal pairs, 
which in the case of /i/-/i/ could include feet, fit; sleep, slip; and neat, knit. Other 
textbooks offer minimal pair activities such as giving the students a minimal pair 
and asking which word contains the specific, targeted vowel (Grant, 1993) or 
giving students a single word and asking them whether it includes the targeted 
vowel (Dalton & Seidlhofer, 1994). Celce-Murcia et al. (2010) also point out that 
in addition to helping students differentiate the vowels, these types of activities 
are likely to remind them that a single vowel mispronunciation can change the 
meaning of a word and lead to misunderstandings (p. 135). More importantly, 
research about the effectiveness of this type of training has shown that it leads to 
significant improvement in perception (Bradlow, Pisoni, Yamada, & Tohkura, 
1997; Logan, Lively, & Pisoni, 1991; McClaskey, Pisoni, & Carell, 1983; Strange & 
Dittmann, 1984; Pisoni, Aslin, Perey, & Hennessey, 1982). 

Although it seems clear that these activities are useful in developing appropriate 
vowel categories and increasing ability to discriminate vowels, there is more than 
one way to do these activities with students. Teachers can do these activities in 
class or they could assign the activities as video or audio files as homework. 
Because many schools do not offer separate pronunciation classes, many 
speaking and listening instructors have limited time to address pronunciation 
topics in the classroom. Assigning audio and video files as homework can make 
use of class time more efficient and effective. Understanding the differences 
between audio and video modes of training may also be useful to pronunciation 
software developers aiming to best meet the needs of their clients. Thus, it is 
important to evaluate the effects of the use of audio versus video in this type of 
training. 

There is reason to believe that video would be a helpful addition to usual 
pronunciation teaching. Information processing theory suggests that by using 
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both auditory and visual information a student is able to use dual-coding and 
access information through multiple routes (Bagui, 1998). A visual addition may 
also affect activation of auditory processing in the brain. It has been shown that a 
subject watching a person make speech-like mouth movements, even when there 
is no accompanying sound, is enough to activate the auditory cortex (Calvert et 
al., 1997). It has even been shown that the addition of visual cues can affect the 
sounds that listeners think they hear. MacDonald and McGurk (1978) studied the 
phenomenon that has come to be called the "McGurk Effect", in which a woman’s 
head is video recorded and video and the audio do not match. Instead, the video 
shows the woman saying the syllable [ba], but the audio accompanying the visual 
is of the woman pronouncing [ga]. Normal viewers/listeners, in result, thought 
they heard [da]. These results show a heavy impact of visual cues in the 
processing of speech. 

Early research in the effects of audio vs. video training with minimal pairs for 
discrimination of sounds has shown that video promotes increased acquisition. 
Hirata and Kelly (2010) examined the effects of training in Japanese vowel 
contrasts with 60 LI English speaking participants, which are created by length 
differences. They conducted training through audio alone, audio plus the visual of 
the person speaking the words, and audio plus visual of the person speaking 
words and using hand gestures. Results show that the added visual of the person 
saying the words improved perception more than audio training alone. This 
study seems to be the only of its kind to examine the effects of audio vs. video 
training on vowel contrasts. Much of the other research concerning the effects of 
audio vs. video training focuses on the /r/ and /l/ contrast in English, which 
suggests that video training in English /r/ and /l/ improves perception more 
than audio training alone (Bradlow, Pisoni, Yamada, & Tohkura, 1997; Hardison, 
2003; Hardison, 2005). This improvement in perception has also been shown to 
carry through to production (Hardison, 2003). All of these research studies have 
found evidence that video is a helpful addition in discrimination task training. 

More research is necessary to examine whether using visual cues will aid or 
hinder the development of aural discrimination categories for vowels in English. 
As previous research indicates, it is likely that extra modeling and visual cues 
may aid in the development of vowel categories. It is possible, however, that 
while students watch videos for training in pronunciation they rely on the visual 
cues (facial movements) to help them determine the vowel. This use of visual 
cues, in turn, could allow students to excel in training activities without 
developing the ability of the ears to truly listen and use spectral cues in 
determining vowels. 

This study aims to investigate the impact of audio vs. video training on subjects’ 
ability to aurally discriminate vowels. In addition, this research aims to examine 
student reactions to the different training delivery methods, which may affect the 
appeal of the exercises. Bagui (1998), for example, found that the introduction of 
animation, sound, and interactivity in lessons increased student motivation. One 
explanation of this increased motivation is that incorporating technology into 
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training increases student autonomy, allowing them to become independent 
learners (Sharma & Barrett, 2007). These authors, however, were examining 
interactive multimedia and it is not clear whether a smaller switch, from audio to 
video, would also change reactions to the training. 

Research Questions 

The purpose of this study is to evaluate the effect of training on the 
discrimination of vowels, specifically, /i/ and /i/ because these vowels do not 
contrast phonemically in many languages and often present problems for 
learners of English. This study aims to answer two questions: 

1. Will the group receiving audio pronunciation training differ from the 
group receiving video pronunciation training in their aural discrimination 
of /i/ and /i/ in the post-test and delayed post-test? 

2. Will students find video training more appealing than audio training? 


Methods 

Participants 

Participants included advanced ESL students enrolled in a college level ESL 
writing class at a large university in the United States. They were assigned to one 
of two groups: 30 to the video training group (80% LI Chinese speakers and 20% 
mixed native languages) and 31 to the audio training group (87% LI Chinese 
speakers and 13% mixed native languages). Although most of the participants 
were native speakers of Chinese, there were three Korean, two Arabic, one Malay, 
one Amharic, one Finnish, one Spanish, and one Indonesian native language 
speaker. Participants ranged in age from 18 to 24, with an average age of 19.67. 
There were 16 female participants and 41 males (four participants declined to 
answer that question.) The amount of English study varied widely among 
participants (1 year to 20 years) with an average English study length of 8.4 
years. The length of time in the U.S.A. also varied from two months to eight years, 
with an average length of stay of approximately 11 months. The formation of 
groups attempted to control for these factors and the pre-test scores aimed to 
equalize proficiency. Table 1 illustrates the make-up of each group including 
averages for each group for length of time studying English, months in the U.S.A., 
and pre-test scores. 
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Table 1. Group Formation Data 



Group 1- Video Training Group 2- Audio Training 

N= 

30 

31 

Native Language 

80% Chinese 

20% Other 

87% Chinese 

13% Other 


M= 18 

M= 23 

Gender 

F= 9 

F= 7 


Non-report= 3 

Non-report= 1 

Years studying English 

8.82 

8.05 

Months studying in U.S.A. 

11.58 

10.44 

Pre-test score 

17.53 

17.6 

SD of pre-test score 

2.43 

2.26 


Materials 

The materials used in this study included a pre-test, post-test (which was also 
used for the delayed post-test), audio and video training materials, a biographical 
data questionnaire, and a student reaction questionnaire that was administered 
after the rest of the study was completed. The pre-and post-test each contained 
twenty listening items. Participants were asked to mark their answers to 
questions about words: 

• Are these two words the same words or different words? 

• Which word of the three is different? 

• Which word of the two has the "e" sound like in "feet”? 

• Does this single word have the "e" sound like in "feet”? 

The targeted sounds in the tests and training materials were the vowel sounds 
/i/ as in "feet" and /i/ as in "fit". Although /i/ exists in many languages of the 
world (Edwards, 1992, p. 215), /i/ is less commonly found, and often presents 
difficulties. Nilsen and Nilsen (2002) point out that speakers from at least 31 
languages are likely to encounter difficulty with this pairing because a distinction 
between the two sounds does not exist in the native language. Learners who may 
have difficulty with these sounds include speakers of Chinese, Spanish, Korean, 
Italian, French, and Russian (p. 1). These vowels are also valuable targets for 
instruction because /i/ and /i/ are very common in English: /i/ is the 4th most 
common vowel and /i/ is the second most common vowel among words in 
English (Edwards, 1992, pp. 215 & 221). Not only are the vowels common, but 
also the distinction between /i/ and / 1 / distinguishes many words in English, 
giving the pairing a high functional load (Catford, 1987, p. 88). Also, because 
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students who received the video saw the visual cues, it was important to ensure 
that there were marked differences in the visual appearance of the pronunciation 
of each sound. For instance, /i/ is pronounced with lips spread wide, almost like 
a smile, whereas the lips in / 1 / are more relaxed and were thus a suitable pair for 
this study. 

The researcher, who was also the source of the voice for all materials, carefully 
recorded each exercise to control for possible length and intonation differences. 
The audio files for the pre- and post-test were recorded through Audacity® 
software (2012). Because previous research has shown that many L2 learners 
will listen for length instead of spectral cues, the files were carefully checked 
before being exported to MP3s to ensure that the length of the word pairs or 
triples was within .02 seconds of each other. If the difference in length was 
greater than .02 seconds, the words were rerecorded. This limit was chosen 
partially due to practical concerns. To avoid cutting or altering the vowels with 
software, I chose to simply rerecord until the vowels were within the chosen 
limit. 

The activities on the tests were specifically chosen in order to avoid an influence 
of spelling. By asking participants to simply make decisions about words, as 
opposed to simply choosing the word they heard, spelling was not involved. For 
that task, participants needed no knowledge about any of the target words in 
order to make decisions about the vowel sounds, which maintains construct 
validity. Also, the post-test was designed so that half of the pairs were previously 
unheard. In other words, the pairs were not included in the pre-test or training 
sessions to give a measure of the generalizability of the findings. To ensure 
equivalence of forms, the items from the pre-and post-test were carefully 
designed. Every word included in the pre-post test was a single, closed syllable. 
Also, attention was paid to equalize the number of vowels that would have 
nasalization or /r/ or /l/ coloring. To check the overall equivalency of forms, the 
items of the pre- and post-tests were mixed together into a single test taken by 
three ESL students who were not participating in the research study. The pre-test 
items were odds and the post-test items were evens. One ESL student scored 
perfectly on both the odd and even items. The other two scored 3-4 and 6-5 (with 
odds being listed first), suggesting that the pre- and post-test items were of 
similar difficulty. 

For each of the two training sessions, participants watched a video (Group 1) or 
listened to an audio file (Group 2.) Activities in the training sessions included 
minimal pair activities because these types of activities have been proven to 
increase aural discrimination of sounds. The video was recorded on a Canon 
Digital SLR and edited in Microsoft Movie Maker (2012). The length of the first 
video was 13 minutes and 22 seconds and the second was 13 minutes and 6 
seconds. To create audio files that were identical in sound and content, the audio 
was then stripped from the video files using a program called Video MP3 
Extractor (GeoVid, 2012). These audio and video materials were provided to 
students for download. 
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The student reaction feedback questionnaire included five Likert scale items 
about the appeal of the training materials (The questionnaire also included two 
open-ended questions to allow for more individualized commentary about the 
training. 

Procedures 

The study consisted of four sessions with the participants. In Session 1, 
participants were given information about the study and signed a participant 
consent form. Then, they completed the biographical subject data questionnaire. 
Before subjects completed the pre-test, the volume level and sound quality was 
analyzed through a test file. The file was audible to all subjects in the room. In 
Sessions 2 and 3, the participants watched or listened to the first training file. 
After listening to the training in Session 3, the participants took an immediate 
post-test. Session 4 took place one week after the third session. In order to check 
the long-term effects of the training, the participants took a delayed post-test 
during Session 4 and then completed the student feedback questionnaire. The 
questionnaire was given at the end of the study because one of the open-ended 
questions asked participants to give their opinion about the training the other 
group received. 

Analysis 

The responses to the biographical data questionnaire were primarily used to 
create equivalent groups and give an indication of the generalizability of the 
results. 

Research question 1 

The pre-, post-, and delayed post-tests were used to answer the first research 
question, whether the groups would differ in performance due to different 
training. The pre- and post-tests were scored; 1 point was given for each correct 
answer and 0 points were given for each incorrect answer. Then an overall score 
was totaled and assigned. Because all subjects took all three tests, it was possible 
to analyze the data using a mixed ANOVA. 

Exclusion of subjects from analyzed data 

Some participants had to be excluded from the analyzed data for the pre-test, 
post-test, and delayed post-test comparisons. There were two reasons for 
exclusion: either a participant did not complete all four sessions of the research 
study, which eliminated 12 participants, or a participant received a perfect score 
on the pre-test. The rationale for the second exclusion possibility is that 
improvement due to training would not be visible in either post-test, which 
occurred in eight cases. This resulted in 21 participants in the audio group and 
19 in the video group. 


Research question 2 
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The student feedback questionnaire was used to address the second research 
question, which considers student reactions to the appeal of the training. All 
participants who completed the questionnaire and both training sessions were 
included in the analyzed questionnaire data (n=54). Reactions to the Likert scale 
items were scored on a 1-5 range, with 5 representing a strong agreement to the 
claim and a 1 representing a strong disagreement to the claim. The average score 
for each item was calculated for comparison. Responses to the open-ended 
questions were coded and examined for common themes and types of responses. 


Results 

Participant Improvement from Pre- to Post- Tests 

Results show that both groups responded similarly to training; they both showed 
statistically significant improvement (p= .000) from the pre-test to the post-test 
with an effect size of .70. Despite a decline in the average score from the post-test 
to the delayed post-test, students maintained a significant improvement from the 
pre-test to the delayed post-test (p= .008, effect size= .39). The decrease from the 
post-test to the delayed post-test was not found to be statistically significant. 
Table 2 reports the average scores for each group at each testing time. 

Table 2. Scores for Pre-, Post-, and Delayed Post-tests by Group 



Video Group Audio Group 


Average 

SD Average SD 

Pre-test 

16.63 

2.52 16.95 2.01 

Post-test 

18.11 

1.85 18.24 1.55 

Delayed post-test 17.47 

3.04 18.00 2.07 


Although the percent improvement for the audio group was slightly higher than 
the video group (7.27% versus 6.43%) when comparing the pre-test to post-test, 
this difference was not found to be statistically significant. Also, the decrease in 
score from the post-test to the delayed post-test for the audio group was slightly 
lower than for the video group as well (1.2% versus 3.2%). However, this was 
also found to be not statistically significant. Figure 1 shows the average score for 
each group on each of the three tests. 
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Figure 1. Scores for Pre-, Post-, and Delayed Post-tests by Group 


Appeal of the Training 

Overall, the video group provided higher scores for every item in the 
questionnaire. Responses from the two groups differed most in response to the 
claim, "I would like to do more training like this." The average score given by the 
video group was 0.8 points higher than the audio group. The smallest difference 
between the groups was found in response to the claim: "The instructions for 
each activity were clear." in which the video group gave an average score only 
0.12 points higher than the audio group. Table 3 shows the five claims presented 
to students and the average score given for each item. 


Table 3. Scores to Likert Scale Questionnaire Items by Group 

Video Audio 

Group Group 



Average SD Average SD 

The instructions for each activity were clear 

4.55 

0.51 4.43 

0.68 

The quality of the recordings was high 

4.50 

0.51 4.00 

0.95 

I feel that my ability to hear vowel differences has 
improved 

3.75 

0.55 3.57 

0.75 

I feel the training was interesting 

3.55 

0.76 3.38 

0.74 

I would like to do more training like this 

3.85 

0.81 3.05 

1.20 


TESL-EJ 16.2, September 2012 


McCrocklin 


'10 













The two usability questions about the clarity of instructions and recording 
quality were given higher scores and the other two appeal scores about the 
training being interesting and the ability to hear vowels after training were given 
lower scores. For the audio group question, however, this item received the 
lowest score. This relationship can be better illustrated in Figure 2. 



Clarity of Recording Ability to Training was Want to do 
instructions quality distinguish interesting more training 

vowels 


■ Video Group 
Audio Group 


Figure 2. Scores to Likert Scale Questionnaire Items by Group 


In response to the open-ended question, only five of the video group thought that 
the audio would have been better. In contrast, 12 members of the audio group 
thought that video would have been better. Only eight members of the audio 
group thought that the video would be worse, whereas 13 members of the video 
group thought that the audio would be worse. Ten members of the video group 
responded that they were not sure, gave unclear answers, or said they thought 
the two delivery methods were equal. This also happened in six cases in the 
audio group. 

Of the participants that thought video was better, 12 indicated that seeing the 
mouth movements was helpful for understanding how the sounds were made. 
One participant wrote, "I think [audio] would've been worse. I could learn how 
lips are different when pronounce different vowels through video." Interestingly, 
one participant even made a comment similar to the dual processing theory. He 
said, "[Video is better] because they use two senses to gain the knowledge." This 
utilization of visual cues was also mentioned as a limitation by the participants 
who thought audio would be more useful. Three participants mentioned that 
because they could get the answers by looking at the different visual cues, audio 
would be better training. Four other participants answered more generally that 
because the goal was to improve listening, audio would be better. One participant 
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said, "[Video is] worse. This activity is focus on listening. The audio is focus on 
listening. The video might make people focus on the screen." Also, two 
participants specifically mentioned that they thought the video would be 
distracting. 

Although most of the participants responded to the question of which medium is 
better in terms of improvement four participants responded in terms of appeal. 
They stated that the video would be better because it is more interesting (two 
from each group). One participant stated, "I think video must be more interesting 
and attractive than just audio files." 

In response to the question, "How do you think this training could be improved?" 
the most common response was that the training needed more difficult items and 
activities (six for video group and seven for audio group). Also, the next most 
common comment for each group was that the training needed more items and 
questions. Interestingly, although in the previous question, four people indicated 
that the video would be more interesting, more people in the video group 
indicated that the training could be improved by making it more interesting. 
Another interesting finding was that two members of the audio group wanted 
clearer directions, but none of the video group members indicated this. Finally, 
one member from each group indicated that they wanted personalized feedback 
from the training. 

Discussion 

This study produced three main findings. First, similar to the numerous previous 
research studies (such as Bradlow, Pisoni, Yamada, & Tohkura, 1997 and Logan, 
Lively, & Pisoni, 1991) this study found significant improvement due to training 
with discrimination tasks. Despite the fact that none of the participants were 
observed re-listening or re-watching parts of the training and despite the fact 
that they received no personalized feedback, students were able to improve. For 
teachers, this means that these listening activities can be done outside of class as 
homework and class time can be utilized more efficiently. 

More importantly, however, in contrast to previous research (Hardison, 2003; 
Hardison, 2005; Hirata & Kelly, 2010) the introduction of video versus audio 
seems to have made little to no difference. This finding does not support the 
information processing theory, which claims that audio plus video would allow 
for dual coding and better storing and accessing of new information (Bagui, 

1998; Treichler, 1967). Instead, it seems that the audio group, which only had 
available one route to store and access information, slightly outperformed the 
video group. They made slightly higher improvements on the post-test and less 
improvement on the delayed post-test. This may indicate that subjects receiving 
video training do rely on facial gestures for making decisions about words 
(similar to MacDonald and McGurk, 1978), therein inhibiting their development 
of aural discrimination categories. As previously mentioned, however, the 
differences between the audio and video group were not found to be statistically 
significant. For teachers, this means that training for English vowels can be done 
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through either method. For most teachers, audio recording, which can be done 
with free software such as Audacity, will be less time consuming and less 
expensive. 

Finally, although the two training types produced similar results in terms of 
participant improvement, reactions were generally more favorable to the video 
training. This finding supports previous research that has shown that the use of 
multimedia can increase student motivation (Bagui, 1998; Sharma & Barrett, 
2007). It seems that the change from audio to video can also produce changes in 
attitudes and reactions towards training. For teachers, this would suggest that by 
incorporating video into instruction (at least occasionally) teachers may be able 
to offset feelings of monotony and increase student interest. 

Nonetheless, it is important that all findings be considered in light of the 
numerous limitations of this study. One of the main problems encountered with 
this study was the ceiling effect caused by numerous high scores on the pre-test. 
Over 75% of the original 61 participants scored a 17 or higher on the pre-test 
(out of a possible 20). This left little room for visible improvement. It may be 
possible that, with a more sensitive pre-test, differences could have been found. 

Another limitation was the time on task for the training. Because the activities 
were repetitive, the length of each training recording was quite short, lasting 
approximately 13 minutes for each recording. By incorporating more training 
activities and more training time, it is possible that differences could have been 
found. Also, although this study began with 61 participants, data from 20 
participants could not be used to analyze improvement from the pre-test to the 
two post-tests. With only 41 subjects, the generalizability of the results is 
somewhat questionable. Similar future research is necessary to confirm these 
findings. 

Despite the limitations of this study, the results suggest that this matter of 
whether to use audio or video training is not conclusively decided. Future 
research should not only aim to replicate these findings, but also expand the 
scope of inquiry to include more pronunciation features. Thus far, only one 
English vowel pairing, /i/-/i/, and one English consonant pairing, /l/-/r/, has 
been investigated [citation?]. Yet there are many other pairings that have clear 
differences in visual cues, such as /a/ vs. /a/ or /0/ vs. /t/ that could add to the 
understanding of the impact of the visual cues. 

Because there has previously been relatively little interest in this area of 
research, there are many possible directions to go with this type of research. 
Although researchers have recently started to turn their attention to the effects 
of audio vs. video training for teaching pronunciation, many questions are left 
unanswered. As teachers are already employing these modes of delivery in their 
classrooms and as homework, it is important that further research be conducted 
to determine the effects of these two methods of training delivery. 
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